Swift: implement type pruning for dataflow#14592
Swift: implement type pruning for dataflow#14592rdmarsh2 wants to merge 21 commits intogithub:mainfrom
Conversation
|
Note that this currently includes some commits from #14570, which will hopefully be rebased away once that PR is merged. |
geoffw0
left a comment
There was a problem hiding this comment.
Initial review. This looks good to me, though (inevitably on a PR of this size and complexity) I've made quite a lot of minor comments / questions. Testing will be important here - though I gather you've done quite a bit of debugging already. I'll have a quick go with the branch myself later today.
| } | ||
|
|
||
| predicate typeStrongerThan(DataFlowType t1, DataFlowType t2) { none() } | ||
| predicate typeStrongerThan(DataFlowType t1, DataFlowType t2) { |
There was a problem hiding this comment.
What is this predicate for? Should it have QLDoc?
There was a problem hiding this comment.
It's used in a few places in the dataflow library to select the more precise of two types, for instance when runtime overload resolution can tell us that a parameter has a more specific type than the argument that flows to it.
There was a problem hiding this comment.
I requested QLDoc when it was introduced here: #13083 (comment), but it was postponed until we had turned the dataflow library into a shared qlpack. This has now been the case for a while so we should probably add QLDoc to this one now (cc @aschackmull)
| class DataFlowType extends TDataFlowType { | ||
| string toString() { result = "" } | ||
| class DataFlowType extends Type { | ||
| DataFlowType() { this.getCanonicalType() = this } |
There was a problem hiding this comment.
What is the canonical type of a type exactly? (the QLDoc sadly reads "Gets the canonical type of this type.").
There was a problem hiding this comment.
It's the unique type after resolving aliases and desugaring. For instance, if we have typealias MyInt == Int, then [MyInt?] has the canonical type Array<Optional<Int>>
There was a problem hiding this comment.
I'm tempted to add getCanonicalType to Expr, Pattern, and whatever form of Decl makes sense, in the same way that we have getUnspecifiedType for C++
| predicate compatibleTypes(DataFlowType t1, DataFlowType t2) { | ||
| exists(DataFlowType commonSub | | ||
| commonSub.getABaseType*().getCanonicalType() = stripType(t1) and | ||
| commonSub.getABaseType*().getCanonicalType() = stripType(t2) |
There was a problem hiding this comment.
I'm a bit worried about performance here, though probably the pragma[inline] keeps it relatively constrained. I guess DCA will confirm or refute my concerns...
| } | ||
|
|
||
| cached | ||
| private module Cached { |
There was a problem hiding this comment.
I'm a bit concerned about the name of this module if it's no longer cached.
There was a problem hiding this comment.
Oops, I did that for debugging and didn't mean for it to land in the final commit.
| or | ||
| t1 instanceof AnyType | ||
| or | ||
| t2 instanceof AnyType |
There was a problem hiding this comment.
Do we need to account for stuff like AnyType? here?
| f1.getResult() = f2.getResult() and | ||
| not exists(int i | i in [0 .. f1.getNumberOfParamTypes()] | | ||
| f1.getParamType(i) != f2.getParamType(i) | ||
| ) and |
There was a problem hiding this comment.
I expected the = and != in here to be recursive calls to compatibleTypes. Should they be?
There was a problem hiding this comment.
Ideally they would... Unfortunately doing that forces the relation to be materialized, which seems to be infeasible.
| @@ -1062,7 +1116,46 @@ string ppReprType(DataFlowType t) { none() } | |||
| * a node of type `t1` to a node of type `t2`. | |||
There was a problem hiding this comment.
It might be worth stating that this relationship is not commutative, i.e. in some cases compatibleTypes(a, b) is not the same as compatibleTypes(b, a)? I found this surprising (but I think I understand why).
There was a problem hiding this comment.
In the current implementation it's actually commutative. It isn't transitive, though.
There was a problem hiding this comment.
I think the use of implies in the bit for a FunctionType makes it not commutative. compatibleTypes(notThrowingFunction, throwingFunction) might hold while compatibleTypes(throwingFunction, notThrowingFunction) does not.
There was a problem hiding this comment.
compatibleTypes is actually required to be symmetric/commutative:
| or | ||
| this.asExpr() instanceof InOutExpr | ||
| or | ||
| this.asExpr() instanceof InOutToPointerExpr |
There was a problem hiding this comment.
Possibly ForceValueExpr, OptionalSomePattern, AnyTryExpr, VarargExpansionExpr??? I'm really not sure though.
There was a problem hiding this comment.
Yeah, all of those can change the type
| if this.hasSelfParam() | ||
| then result = this.getInterfaceType().(AnyFunctionType).getResult().(AnyFunctionType).getResult() | ||
| else result = this.getInterfaceType().(AnyFunctionType).getResult() | ||
| } |
There was a problem hiding this comment.
This is going to be widely useful. 👍
| @@ -0,0 +1,7 @@ | |||
| private import swift | |||
There was a problem hiding this comment.
I agree with the qldoc checks that this file and the AnyType class should have QLDoc. It can be short and to the point.
| ) and | ||
| (f1.isThrowing() implies f2.isThrowing()) and | ||
| (f1.isAsync() implies f2.isAsync()) | ||
| ) |
There was a problem hiding this comment.
I've just noticed t1, t2 are not bound to f1, f2 in this case.
There was a problem hiding this comment.
oh jeez, that might be why it was blowing up...
There was a problem hiding this comment.
Yeah, I'm seeing performance issues as well.
geoffw0
left a comment
There was a problem hiding this comment.
Sorry for taking a few days to get back to this. I've asked a couple of questions to aid my understanding.
We will need a new DCA run on this PR, when you think it's ready.
| not t instanceof AnyFunctionType | ||
| } or | ||
| TTopFunctionType() or | ||
| TDataFlowFunctionType(DataFlowCallable c) |
There was a problem hiding this comment.
What's the difference between TTopFunctionType and TDataFlowFunctionType? Why do we need both?
There was a problem hiding this comment.
TTopFunctionType allows any function type, DataFlowFunctionType is a specific callable (and can be used to improve precision of lambda call resolution)
| ) | ||
| or | ||
| rk.(ParamReturnKind).getIndex() = -1 and | ||
| // c.getSelfParam().isInout() and |
There was a problem hiding this comment.
Why did this line and 5 lines above get commented out?
| exists(t) and | ||
| exists(pos) and | ||
| result.asType() instanceof AnyType | ||
| } |
There was a problem hiding this comment.
This predicate only applies to nodes that are synthesized from a callback in a modeled function. The AnyType is the top of Swift's type hierarchy (although it's not included in the getABaseType relationship), so we're allowing any type here. It's certainly possible to be more precise, though.
There was a problem hiding this comment.
This looks like a cartesian product between all DataFlowTypes and all ArgumentPositions, right? This was fine before because DataFlowType was a unit type, but this looks like a real CP now, though? Or is the assumption that this fine because there aren't that many ArgumentPositions?
32ed674 to
1ee6a03
Compare
| subType(t1) = subType(t2) | ||
| or | ||
| exists(BoundGenericType bound1, BoundGenericType bound2 | | ||
| stripType(t1.asType()).(BoundGenericType) = bound1 and |
Check warning
Code scanning / CodeQL
Redundant cast
| or | ||
| exists(BoundGenericType bound1, BoundGenericType bound2 | | ||
| stripType(t1.asType()).(BoundGenericType) = bound1 and | ||
| stripType(t2.asType()).(BoundGenericType) = bound2 and |
Check warning
Code scanning / CodeQL
Redundant cast
| /* | ||
| * The Swift `Any` type, which is a supertype of all other types. | ||
| */ |
Check warning
Code scanning / CodeQL
Block comment that is not QLDoc
MathiasVP
left a comment
There was a problem hiding this comment.
Lots of comments (mostly just questions to help me understand the code), but I'm really happy that we're almost there 🎉
| result.asCallable().asSourceCallable() = expr | ||
| or | ||
| result.asCallable().asSourceCallable() = expr.(DeclRefExpr).getDecl() |
There was a problem hiding this comment.
Why do we need both cases here? Is that to handle each callable having their own type?
In any case, a comment on this would be helpful
There was a problem hiding this comment.
That's handling lambdas vs references to functions
| } | ||
|
|
||
| override DataFlowType getTypeImpl() { | ||
| result = getDataFlowType(this.asDefinition().getSourceVariable().asVarDecl().getType()) |
There was a problem hiding this comment.
Not all SourceVariables has a result for asVarDecl (in particular, some of them are related to key paths: https://github.com/github/codeql/blob/main/swift/ql/lib/codeql/swift/dataflow/Ssa.qll#L28).
It seems like these don't have a type now? Should we instead have a getType predicate on SourceVariable (that all instances of the abstract class then implements) so that this predicate could instead be implemented as getDataFlowType(this.asDefinition().getSourceVariable())?
| override DataFlowType getTypeImpl() { | ||
| exists(BoundGenericType dictType, TupleType tupleType | | ||
| dictType = expr.getBase().getType().getCanonicalType() and | ||
| tupleType.getType(0) = dictType.getArgType(0) and | ||
| tupleType.getType(1) = dictType.getArgType(1) and | ||
| tupleType.getNumberOfTypes() = 2 and | ||
| result.asType() = tupleType | ||
| // TODO: handle generic parameter types | ||
| ) | ||
| } |
There was a problem hiding this comment.
This looks very complicated 😨. Since DictionarySubscriptNode is really just an intermediate node that follows a subscript so that we can push both a TupleContent and a CollectionContent: could we instead simply get the type here by forwarding the call to the node representing the subscript expression?
There was a problem hiding this comment.
I don't think so - the type of this node needs to be a tuple type so the read steps work out correctly.
| KeyPathExpr getKeyPathExpr() { result = exit.getScope() } | ||
|
|
||
| override DataFlowType getTypeImpl() { | ||
| result = getDataFlowType(this.getKeyPathExpr().getType().(BoundGenericType).getArgType(1)) |
There was a problem hiding this comment.
Why does this need to be getArgType(1)?
There was a problem hiding this comment.
See the above comment.
|
|
||
| KeyPathComponent getComponent() { result = component } | ||
|
|
||
| override DataFlowType getTypeImpl() { result = getDataFlowType(component.getComponentType()) } |
There was a problem hiding this comment.
Do we really need to implement getTypeImpl on all the post-update nodes? Can the result of getTypeImpl on a post-update node not simply be the result of calling getTypeImpl on the pre-update node?
There was a problem hiding this comment.
Oh, I see what we do actually implement this predicate on PostUpdateNodes: https://github.com/github/codeql/pull/14592/files#diff-514dd0988f7b47e40ff8c1e66ea886d1f947d047da108270ed0ddec75c2856c1R1531. Do we really need this (and similar) overrides then?
There was a problem hiding this comment.
Probably not. I can delete and recheck
There was a problem hiding this comment.
That was actually only on ExprPostUpdateNodes. Moving it to PostUpdateNodeImpl simplifies some things, although it leads to a couple of duplicate overrides. It also fixed a bug with the postupdate nodes on keypaths.
| exists(TupleType tuple1, TupleType tuple2 | | ||
| stripType(t1.asType()) = tuple1 and | ||
| stripType(t2.asType()) = tuple2 and | ||
| compatibleTuples(tuple1, tuple2) |
There was a problem hiding this comment.
Similar question here: I think that compatibleTuples holds when there is a common index in which both tuple types is either the any type or an unbound type. When no such index exists, does that mean they're compatible because of the first condition in compatibleTypes?
| /** | ||
| * Gets the type of this node. | ||
| */ | ||
| DataFlowType getType() { result = this.(NodeImpl).getTypeImpl() } |
There was a problem hiding this comment.
Do we really want to expose the DataFlowType type to users?
| exists(t) and | ||
| exists(pos) and | ||
| result.asType() instanceof AnyType | ||
| } |
There was a problem hiding this comment.
This looks like a cartesian product between all DataFlowTypes and all ArgumentPositions, right? This was fine before because DataFlowType was a unit type, but this looks like a real CP now, though? Or is the assumption that this fine because there aren't that many ArgumentPositions?
| DataFlowType getCallbackReturnType(DataFlowType t, ReturnKind rk) { | ||
| exists(t) and | ||
| exists(rk) and | ||
| result.asType() instanceof AnyType | ||
| } |
There was a problem hiding this comment.
Same here. This looks like a CP between all types and all return kinds. I guess this is fine if the getCallbackParameterType case is also fine?
There was a problem hiding this comment.
It was fine but we don't actually need these anymore - they were part of the old API but not the new one.
| TupleType asUnlabeled() { | ||
| result.getNumberOfTypes() = this.getNumberOfTypes() and | ||
| not exists(int index | result.getType(index) != this.getType(index)) and | ||
| not exists(result.getName(_)) | ||
| } |
There was a problem hiding this comment.
Two questions:
- This looks like something that could blow up with a bad join order, right? That is, it we start by joining together all
TupleTypesthat has the same number of types. Does the joins look okay in this case? - Does the unlabeled version of a tuple type always exist? Or should the
QLDocsay[...], if any.? And if this predicate really is partial: Is that a problem for the way we use it instripType?
|
Today is my last day before Christmas, so I'm probably not going to get chance to review the final version of this. I just want to post something to make it clear I do not expect you to wait for my final 👍. I trust @rdmarsh2 s work and @MathiasVP s judgement, so once you're both happy I'm happy. I do suggest you do another DCA run at some point though, and make sure any remaining regressions in performance or query results are at least understood. |
|
I think there are some comments above that haven't yet been addressed. Please would you reply to them, fix the failing checks, and state when this PR is ready to review again? Is there anything we can do to help? |
This PR implements type pruning for dataflow by adding
DataFlowType, which corresponds to only the canonical types in the Swift type hierarchy, adding a newgetTypemember predicate toDataFlow::Node, and using them to implement thegetNodeTypepredicate that type pruning relies on.